Topic Indexing with Wikipedia
نویسندگان
چکیده
Wikipedia can be utilized as a controlled vocabulary for identifying the main topics in a document, with article titles serving as index terms and redirect titles as their synonyms. Wikipedia contains over 4M such titles covering the terminology of nearly any document collection. This permits controlled indexing in the absence of manually created vocabularies. We combine state-of-the-art strategies for automatic controlled indexing with Wikipedia’s unique property—a richly hyperlinked encyclopedia. We evaluate the scheme by comparing automatically assigned topics with those chosen manually by human indexers. Analysis of indexing consistency shows that our algorithm performs as well as the average person.
منابع مشابه
Towards Conceptual Indexing of the Blogosphere through Wikipedia Topic Hierarchy
This paper studies the issue of conceptually indexing the blogosphere through the whole hierarchy of Wikipedia entries. About 300,000 Wikipedia entries are used for representing a hierarchy of topics. Based on the results of judging whether each blog feed is relevant to a given Wikipedia entry, this paper proposes how to judge whether there exist blog feeds to be linked from the given entry. In...
متن کاملInformation filtering based on wiki index database
In this paper we present a profile-based approach to information filtering by an analysis of the content of text documents. The Wikipedia index database is created and used to automatically generate the user profile from the user’s document collection. The problem-oriented Wikipedia subcorpora are created (using knowledge extracted from the user profile) for each topic of user interests. The in...
متن کاملIntroduction to the RKEA Package
A short introduction to the RKEA package. Introduction The RKEA package provides a R interface to Kea (http://www.nzdl.org/Kea/), a tool for keyword extraction in texts. See https://code.google.com/p/kea-algorithm/ and http: //www.nzdl.org/Kea/Download/Kea-5.0-Readme.txt for further information on Kea. Note that Maui (http://maui-indexer.googlecode.com/), an algorithm for topic indexing, can be...
متن کاملHuman-competitive automatic topic indexing
Topic indexing is the task of identifying the main topics covered by a document. These are useful for many purposes: as subject headings in libraries, as keywords in academic publications and as tags on the web. Knowing a document’s topics helps people judge its relevance quickly. However, assigning topics manually is labor intensive. This thesis shows how to generate them automatically in a wa...
متن کاملClustering Document with Active Learning using Wikipedia
Wikipedia has been applied as a background knowledge base to various text mining problems, including document categorization, topic indexing and information extraction. However, very few attempts have been made to utilize it for document clustering. In this paper we propose to exploit Wikipedia and the semantic knowledge therein to facilitate clustering, enabling the automatic grouping of docum...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008